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Abstract 

Subspace recovery from corrupted and missing data is crucial for various applications in signal processing and information 
theory. To complete missing values and detect column corruptions, existing robust Matrix Completion (MC) methods mostly 
concentrate on recovering a low-rank matrix from few corrupted coefficients w.r.t. standard basis, which, however, does not apply 
to more general basis, e.g., Fourier basis. In this paper, we prove that the range space of an m x n matrix with rank r can 
be exactly recovered from few coefficients w.r.t. general basis, though r and the number of corrupted samples are both as high 
as 0 (min{m, n}/log®(m -f n)). Our model covers previous ones as special cases, and robust MC can recover the intrinsic 
matrix with a higher rank. Moreover, we suggest a universal choice of the regularization parameter, which is A = l/yTogTi. 
By our (.2,1 filtering algorithm, which has theoretical guarantees, we can further reduce the computational cost of our model. As 
an application, we also find that the solutions to extended robust Low-Rank Representation and to our extended robust MC are 
mutually expressible, so both our theory and algorithm can be applied to the subspace clustering problem with missing values 
under certain conditions. Experiments verify our theories. 


Index Terms 

Robust Matrix Completion, General Basis, Subspace Recovery, Outlier Detection, £2,1 Filtering Algorithm 


I. Introduction 

We are now in an era of big and high-dimensional data. Unfortunately, due to the storage difficulty and the computational 
obstacle, we can measure only a few entries from the data matrix. So restoring all of the information that the data carry through 
the partial measurements is of great interest in data analysis. This challenging problem is also known as the Matrix Completion 
(MC) problem, which is highly related to the so-called recommendation system, where one tries to predict um'evealed users’ 
preference according to the incomplete rating feedback. Admittedly, this inverse problem is ill-posed as there should be infinite 
number of feasible solutions. Fortunately, most of the data are structured, e.g., face IT], texture |2l, and motion El, a, 0. 
They typically lie around low-dimensional subspaces. Because the rank of data matrix corresponds to the dimensionality of 
subspace, recent work a, m, i), a in convex optimization demonstrates a remarkable fact: it is possible to exactly complete 
an m X n matrix of rank r, if the number of randomly selected matrix elements is no less than 0{{m + n)r log^(m + n)). 

Yet it is well known that the traditional MC model suffers from the robustness issue. It is even sensitive to minor corruptions, 
which commonly occur due to sensor failures and uncontrolled environments. In the recommendation system, for instance, 
malicious manipulation of even a single rater might drive the output of MC algorithm far from the ground truth. To resolve 
the issue, several efforts have been devoted to robustifying the MC model, among which robust MC ifTOll is the one with solid 
theoretical analysis. Chen et al. IfTOl proved that robust MC is able to exactly recover the ground truth subspace and detect the 
column corruptions (i.e., some entire columns are corrupted by noises), if the dimensionality of subspace is not too high and 
the corrupted columns are sparse compared with the input size. Most importantly, the observed expansion coefficients should 
be sufficient w.r.t. the standard matrix basis {eiej}ij (please refer to Table for explanation of notations). 

However, recent advances in theoretical physics measure quantum-state entries by tomography w.r.t. the Pauli basis, which 
is rather different from the standard matrix one 0. So it is not very straightforward to apply the existing theory on robust MC 
to such a special case. This paper tries to resolve the problem. More generally, we demonstrate the exact recoverability of an 
extended robust MC model in the presence of only a few coefficients w.r.t. a set of general basis, although some columns of 
the intrinsic matrix might be arbitrarily corrupted. By applying our £ 2,1 filtering algorithm which has theoretical guarantees, 
we are able to speed up solving the model numerically. There are various applications of our results. 

A. Practical Applications 

In numerical analysis, instead of the standard polynomial basis the Legendre polynomials are widely used to 

represent smooth functions due to their orthogonality. Such expansions, however, are typically sensitive to perturbation: a 
small perturbation of the function might arbitrarily drive the fitting result far from its original. Moreover, to reduce the storage 
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and the computational costs, sometimes we can record only a few expansion coefficients. To complete the missing values and 
get the outliers removed, this paper justihes the possibility of doing so. 

In digital signal processing, one usually samples the signals, e.g., voices and feature vectors, at random in the Fourier basis. 
However, due to sensor failure, a group of signals that we capture may be rather unreliable. To recover the intrinsic information 
that the signals carry and remove the outliers simultaneously, our theoretical analysis guarantees the success of robust MC 
w.r.t. the Fourier basis. 

In quantum information theory, to obtain a maximum likelihood estimation of a quantum state of 8 ions, one typically 
requires hundred of thousands of measurements w.r.t. the Pauli basis, which are unaffordable because of high experimental 
costs. To overcome the difficulty. Gross HI compressed the number of observations w.r.t. any basis by an MC model. However, 
their model is fragile to severe corruptions, which commonly occurs because of measurement errors. To robustify the model, 
this paper justihes the exact recoverability of robust MC w.r.t. general basis, even if the datasets are wildly corrupted. 

In subspace clustering, one tries to segment the data points according to the subspaces they lie in, which can be widely 
applied to motion segmentation a, a, 0 , im, na, face classihcation 0, HH, HU, El, system identihcation m. Oil, 
oa, and image segmentation 03, ilQl- Recently, it is of great interest to cluster the subspaces while the observations w.r.t. 
some coordinates are missing. To resolve the issue, as an application in this paper, our theorem relates robust MC to a certain 
subspace clustering model - the so-called extended robust Low-Rank Representation (LRR). Thus one could hope to correctly 
recover the structure of multiple subspaces, if robust MC is able to complete the unavailable values and remove the outlier 
samples at an overwhelming probability. This is guaranteed by our paper. 

B. Related Work 

Suppose that Lq is an to x n data matrix of rank r whose columns are sample points, and entries are partially observed 
among the set JCobs- The MC problem aims at exactly recovering Lq, or the range space of Lq, from the measured elements. 
Probably the most well-known MC model was proposed by Candes et al. 0. To choose the lowest-rank matrix so as to ht 
the observed entries, the original model is formulated as 

mmrank(L), s.t. (L, eicj) = (Lq, e^ej), {i,j)elCobs- (1) 

This model, however, is untractable because problem Q is NP-hard. Inspired by recent work in compressive sensing, Candes 
et al. replaced the rank in the objective function with the nuclear norm, which is the sum of singular values and is the convex 
envelope of rank on the unit ball of matrix operator norm. Namely, 

nnn||L||*, s.t. (L, CicJ) = (Lq, e^ej), {i,j)GlCobs- (2) 

It is worth noting that model 0 is only w.r.t. the standard matrix basis {eiej}ij. To extend the model to any basis {ujij}ij, 
Gross 0 proposed a more general MC model; 

-*^^^11-^11*, (L J) ^ ^obs- (2) 

Models 0 and 0 both have solid theoretical guarantees; recent work 0, 0, 0, 0 showed that the models are able 
to exactly recover the ground truth Lq by an overwhelming probability, if ICobs is uniformly distributed among all sets of 
cardinality 0((m -f n)r’log^(TO + n)). Unfortunately, these traditional MC models suffer from the robustness issue; they are 
even sensitive to minor corruptions, which commonly occurs due to sensor failures, uncontrolled environments, etc. 

A parallel study to the MC problem is the so-called matrix recovery, namely, recovering underlying data matrix Lq, or the 
range space of Lq, from the corrupted data matrix M = Lq + Sq, where Sq is the noise. Probably the most widely used one 
is Principal Component Analysis (PCA). However, PCA is fragile to outliers. Even a single but severe corruption may wildly 
degrade the performance of PCA. To resolve the issue, much work has been devoted to robustifying PCA 121, 111, m, 
in, 123, l26l, El, EH, m, among which a simple yet successful model to remove column corruptions is robust PCA via 
Outlier Pursuit; 

minrank(L)-I-A|jS'|| 2 ,i, s.t. M = L + S, (4) 

and its convex relaxation 

min ||L||*-I-A||S'|| 2 ,i, s.t. M = L + S. (5) 

L ,S 

Outlier Pursuit has theoretical guarantees; Xu et al. IMl and our previous work OTI proved that when the dimensionality of 
ground truth subspace is not too high and the column-wise corruptions are sparse compared with the sample size. Outlier Pursuit 
is able to recover the range space of Lq and detect the non-zero columns of Sq at an overwhelming probability. Nowadays, 
Outlier Pursuit has been widely applied to subspace clustering ll32l . image alignment E3, texture representation 1341 . etc. 
Unfortunately, the model cannot handle the case of missing values, which signihcantly limits its working range in practice. 

It is worth noting that the pros and cons of above-mentioned MC and Outlier Pursuit are mutually complementary. To 
remedy both of their limitations, recent work ifTOl suggested combining the two models together, resulting in robust MC - a 
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model that could complete the missing values and detect the column corruptions simultaneously. Specifically, it is formulated 
as 


minrank(L) + A||S'|j 2 ,i, 

s.t. (M, e,ep = {L + S, e,e*), (i, j) e ICobs- 

Correspondingly, the relaxed form is 


( 6 ) 


min||L||* + A||S'|| 2 ,i, 

s.t. {M,eie*) = {L + S,eie*), {i,j)GJCobs- 


(7) 


Chen at al. ifTOll demonstrated the recoverability of model Q, namely, if the range space of Lq is low-dimensional, the observed 
entries are sufficient, and the column corruptions are sparse compared with the input size, one can hope to exactly recover 
the range space of Lq and detect the corrupted samples by robust MC at an overwhelming probability. It is well reported that 
robust MC has been widely applied to recommendation system and medical research Go). However, the specific basis {eiej}ij 
in problem Q limits its extensible applications to more challenging tasks, such as those discussed in Section |I-A| 


C. Our Contributions 


In this paper, we extend robust MC to more general cases, namely, the expansion coefficients are observed w.r.t. a set of 
general basis. We are particularly interested in the exact recoverability of this extended model. Our contributions are as follows: 

• We demonstrate that the extended robust MC model succeeds at an overwhelming probability. This result broadens the 
working range of traditional robust MC in three aspects: 1. the choice of basis in our model is not limited to the standard 
one anymore; 2. with slightly stronger yet reasonable incoherence (ambiguity) conditions, our result allows rank(Lo) to 
be as high as 0{n/ log^ n) even when the number of corruptions and observations are both constant fraction of the total 
input size. In comparison with the existing result which requires that rank(Lo) = 0{1), our analysis significantly extends 
the succeeding range of robust MC model; 3. we suggest that the regularization parameter be chosen as A = l/Vlog n, 
which is universal. 

• We propose a so-called £ 2,1 filtering algorithm to reduce the computational complexity of our model. Furthermore, 
we establish theoretical guarantees for our algorithm, which are elegantly relevant to the incoherence of the low-rank 
component. 

• As an application, we relate the extended robust MC model to a certain subspace clustering model - extended robust 
LRR. So both our theory and our algorithm on the extended robust MC can be applied to the subspace clustering problem 
if the extended robust MC can exactly recover the data structure. 


1) Novelty of Our Analysis Technique: In the analysis of the exact recoverability of the model, we novelly divide the proof 
of Theorem [T] into two parts: The exact recoverability of column support and the exact recoverability of column space. We are 
able to attack the two problems separately thanks to the idea of expanding the objective function at the well-designed points. 
i.e., (L, S) for the recovery of column support and (L, S) for the recovery of column space, respectively (see Sections IV-Bl 
and |IV-C1| for details). This technique enables us to decouple the randomization of Xq and Clobs, and so construct the dual 
variables easily by standard tools like the least squares and golfing scheme. We notice that our framework is general. It not 
only can be applied to the proof for easier model like Outlier Pursuit 1211 (though we will sacrifice a small polylog factor for 
the probability of outliers), but can also hopefully simplify the proof for model with more complicated formulation. That is 
roughly the high-level intuition why we can handle the general basis in this paper. 

In the analysis for our £ 2,1 filtering algorithm, we take advantage of the low-rank property, namely, we recover a small-sized 
seed matrix first and then use the linear representation to obtain the whole desired matrix. Our analysis employs tools in 
recent matrix concentration literature ll35l to bound the size of the seed matrix, which elegantly relates to the incoherence of 
the underlying matrix. This is definitely consistent with the fact that, for matrix with high incoherence, we typically need to 
sample more columns in order to fully observe the maximal linearly independent group (see Algorithm for the procedure). 

The remainder of this paper is organized as follows. Section [n| describes the problem setup. Section III shows our theoretical 
results, i.e., the exact recoverability of our model. In Section |IV| we present the detailed proofs of our main results. Section 
[V] proposes a novel £ 2,1 filtering algorithm for the extended robust MC model, and establishes theoretical guarantees for the 
algorithm. We show an application of our analysis to subspace clustering problem, and demonstrate the validity of our theory 
by experiments in Section VI Finally, Section VII concludes the paper. 


II. Problem Setup 

Suppose that Lq is an to x n data matrix of rank r, whose columns are sample points. Sq G is a noise matrix, whose 

column support is sparse compared with the input size n. Let M — Lq + Sq. Its expansion coefficients w.r.t. a set of general 
basis {ojij}ij, {i,j) G ICobs, are partially observed. This paper considers the exact recovery problem as defined below. 
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Definition 1 (Exact Recovery Problem). The exact recovery problem investigates whether the range space of Lq and the 
column support of So can be exactly recovered from randomly selected coefficients of M w.r.t. general basis, provided that 
some columns of M are arbitrarily corrupted. 

A similar problem was proposed in usa, m, which recovered the whole matrix Lq and Sq themselves if Sq has element¬ 
wise support. However, it is worth noting that one can only hope to recover the range space of Lq and the column support 
of in Definition [T] because a corrupted column can be addition of any one vector in the range space of Lq and another 
appropriate vector O^ . Qoi, m- Moreover, as existing work mostly concentrates on recovering a low-rank matrix from a 
sampling of matrix elements, our exact recovery problem covers this situation as a special case. 


A. Model Formulations 


As our exact recovery problem defines, we study an extended robust MC model w.r.t. a set of general basis. To choose the 
solution L with the lowest rank, the original model is formulated as 


minrank(L) -f A||S'|j 2 i, 

’ ( 8 ) 

S.t. — (Z/-f S,UJijf (^;j) ^ ^obs7 

where JCobs is the observation index and is a set of ortho-normal bases such that 

Spanjwy, i = 1,..., to} = Spanjeje*, i = 1,..., m}, Vj. (9) 

Unfortunately, problem (|^ is NP-hard because the rank function is discrete. So we replace the rank in the objective function 
with the nuclear norm, resulting in the relaxed formulation; 


min||L|U + All^'lls,!, 

s.t. (ikf, — {L-\- (^;j) ^ ^obs- 

For brevity, we also rewrite it as 

min||L||*-b AllS-lla.i, s.t. 11(1 + S) = TZ{M), 


( 10 ) 

( 11 ) 


where 7^(•) = ■,ujij)ujij is an operator which projects a matrix onto the space ftobs = Spanjwy , i,j G /Cobs}, i-e-. 

= 'Po.abs- 

In this paper, we show that problem ( [TOl i, or equivalently problem exactly recovers the range space of Lq and the 
column support of Sq, if the rank of Lq is no higher than 0(n/log^n), and the number of corruptions and observations are 
(nearly) constant fractions of the total input size. In other words, the original problem ([^ can be well approximated by the 
relaxed problem ([T0|l. 


B. Assumptions 

At first sight, it seems not always possible to successfully separate M as the low-rank term plus the column-sparse one, 
because there seems to not be sufficient information to avoid the identifiability issues. The identifiability issues are reflected 
in two aspects: the true low-rank term might be sparse and the true sparse component might be low-rank, thus we cannot 
hopefully identify the ground truth correctly. So we require several assumptions in order to avoid such unidentifiable cases. 

1) Incoherence Conditions on the Low-Rank Term: As an extreme example, suppose that the low-rank term has only one 
non-zero entry, e.g., eie}. This matrix has a one in the top left corner and zeros elsewhere, thus being both low-rank and 
sparse. So it is impossible to identify this matrix as the low-rank term correctly. Moreover, we cannot expect to recover the 
range space of this matrix from a sampling of its entries, unless we pretty much observe all of the elements. 

To resolve the issue. Gross fS) introduced /r-incoherence condition to the low-rank term L G in problem Q w.r.t. 

the general basis 

max IIf ^ . (avoid column sparsity) (12a) 

ij n 

max.\\VubOij\\ f < —, (avoid row sparsity) (12b) 

ij TO 

max((7U*, (12c) 

ij mn 

where IfEV* G is the skinny SVD of L. Intuitively, as discussed in IHl, OTI . conditions ( |12a| l, ( |12b| l, and ( |12c| l assert 

that the singular vectors reasonably spread out for small /i. Because problem ([^, which is a noiseless version of problem 
( [Tol l, requires conditions ( |12a[ ), ( |12b[ i, and ( |12c| i in its theoretical guarantees ijS), we will set the same incoherence conditions 
to analyze our model ( [T0| i as well. We argue that beyond ( |12a| i, conditions ( |12b| ) and ( |12c| ) are indispensible for the exact 
recovery of the target matrix in our setting. As an example, let few entries in the first row of a matrix be non-zeros while 
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Fig. 1. Illustration of the ambiguity condition. From the left to the right, the increases and the data tend to lie in a low-dimensional subspace. 


all other elements are zeros. This matrix satisfies condition ( |12a| ) but does not satisfy ( |12b| l and ( |12c| ). In this scenario the 
probability of recovering its column space is not very high, as we cannot guarantee to take a sample from those uncorrupted 
non-zero entries, when there are a large amount of noises. 

So we assume that the low-rank part L satisfies conditions ( |12a| l, ( |12b| l, and ( |12c| l, and the low-rank component L satisfies 
condition ( |12a| ), as work BTl did (please refer to Table |I] for explanation of notations). Though it is more natural to assume 
the incoherence on the following example shows that the incoherence of Lq does not suffice to guarantee the success of 
model ( [Tol l when the rank is relatively high: 

Example 1. Compute Lg = as a product of n x r i.i.d. A^lO, 1) matrices. The column support of Sq is sampled by 

Bernoulli distribution with parameter a. Let the first entry of each non-zero column of Sq be n and all other entries be zeros. 
Also set the observation matrix as V^abs (^o + Sf), where flo&s is the set of observed index selected by i.i.d. Ber{pf). We adopt 
n = 10,000, r = O.ln, pg = 1, and a = 10/n, so there are around constant number of corrupted samples in this example. 
Note that, here, Lq is incoherent fulfilling conditions ( fTIal l, ( fT^ , and ( fTIcj l, while L and L are not. However, the output 
of algorithm falsely identifies all of the corrupted samples as the clean data. So the incoherence of Lq cannot guarantee the 
exact recoverability of our model. 

Imposing incoherence conditions on L = Lg -f VigHi, and L = Lq -\- VuoHl is not so surprising: there might be multiple 
solutions for the optimization model, and the low-rankness/sparseness decompositions of M are non-unique (depending on 
which solution we are considering). Since L -y S and L -f S' are two eligible decompositions of M related to a fixed optimal 
solution pair, it is natural to consider imposing incoherence on them. Specifically, we first assume incoherence conditions 
( |12a| l, ( |12b| l, and P2c| ) on L — Lq VigHi,. Note that these conditions guarantee that matrix L cannot be sparse, so we 
can resolve the identifiability issue for the decomposition M = L S and hopefully recover the index 2g. After that, the 
ambiguity between the low rankness and the row sparseness is not an issue any more, i.e., even for row-sparse underlying 
matrix we can still expect to recover its column space. Here is an example to illustrate this: suppose the low rank matrix is 
eil* which has ones in the first rows and zeros elsewhere, and we have known some of the columns are corrupted by noise. 
Remove the outlier columns. Even we cannot fully observe the remaining entries, we can still expect to recover the column 
space Range(ei) since the information for the range space is sufficient to us. Therefore, we only need to impose condition 
( |12a| i on L = Lg -f VuqHl, which asserts that L cannot be column-sparse. 

2) Ambiguity Conditions on Column-Sparse Term: Analogously, the column-sparse term S has the identification issue as 
well. Suppose that is a rank-1 matrix such that a constant fraction of the columns are zeros. This matrix is both low-rank 
and column-sparse, which cannot be correctly identified. To avoid this case, one needs the isotropic assumption |[38l, or the 
following ambiguity condition, on the column-sparse term S, which is introduced by EH: 

ms)\\<p', (13) 

where p' can be any numerical constant. Here the isotropic assumption asserts that the covariance of the noise matrix is the 
identity. In fact, many noise models satisfy this assumption, e.g., i.i.d. Gaussian noise. So the normalized noise vector would 
uniformly distribute on the surface of a unit sphere centered at the origin, thus they cannot be in a low-dimensional subspace 
— in other words, not low-rank. Similarly, the ambiguity condition was proposed for the same purpose OTll . Geometrically, 
the spectral norm stands for the length of the first principal direction (we use operator B to remove the scaling factor). So 
condition ( [T3| ) asserts that the energy for each principal direction does not differ too much, namely, the data distribute around 
a ball (see Figure [^, and ( [T3] l holds once the directions of non-zero columns of S scatter sufficiently randomly. Note that the 
isotropic assumption implies our ambiguity condition: if the columns of S are isotropic, ||S(S')|| would be a constant even 
though the number of column suj^ort of S is comparable to n. Thus our ambiguity condition ( [T3] l is feasible. No matter what 
number of non-zero columns of S is, the assumption guarantees matrix S not to be low-rank. 

3 ) Probability Model: Our main results assume that the column support of Sq and the entry support of measured set JCobs 
obey i.i.d. Bernoulli distribution with parameter a and parameter pg, respectively. Such assumptions are mild because we have 
no further information on the positions of outlier and measurement. More specifically, we assume that [S'g]:j = [(5g]j[Zg]:j 
throughout our proof, where [(5g]j ~ Ber(p) determines the outlier positions and [^g]:j determines the outlier values. If an 
event holds with a probability at least 1 — 0(n“^°), we say that the event happens with an overwhelming probability. 
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4) Other Assumptions: Obviously, to guarantee the exact recovery of Range(Lo), the noiseless samples Vx^Lq should span 
the same space as that of Range(Lo)j i-S-j Range(Lo) = R™ge('Px^^o)- Otherwise, only a subspace of Range(Lo) can be 
recovered, because the noises may be arbitrarily severe. So without loss of generality, we assume Lq = Vx^Lq, as work ll^ . 
Eol did. Moreover, the noises should be identifiable, namely, they cannot lie in the ground truth Range (Lq)- 


C. Summary of Main Notations 

In this paper, matrice are denoted by capital symbols. For matrix M, we represent M-j or as the jth column of M. We 
denote by My the entry at the ith row, jth column of the matrix. For matrix operators, M* and M^ represent the conjugate 
transpose and the Moore-Penrose pseudo-inverse of M, respectively, and \M\ stands for the matrix whose (j,j)-th entry is 
|Myj. 

Several norms appear in this paper, both for vector and for matrix. The only vector norm we use is || • II 2 , which stands 
for the Euclidean norm or the vector £2 norm. For matrix norm, we denote by |j • ||, the nuclear norm, which stands for the 
sum of singular values. The matrix norm analogous to the vector ^2 norm is the Frobenious norm, represented by || • |jx- The 
pseudo-norms, || • ||o and || • || 2 , 0 j denote the number of non-zero entries and non-zero columns of a matrix, respectively; They 
are not real norms because the absolute homogeneity does not hold. The convex surrogates of || Ho and || • || 2 ,o matrix 
and ^ 2,1 norms, with definitions ||M||i = |Myj and ||M|| 2 ,i = ||-^:j|| 2 , respectively. The dual norms of matrix 

and .^ 2,1 norms are foo and £ 2.00 norms, represented by ||M||oo = maxy |Myj and ||M|| 2 ,oo = max^- ||M:jjj 2 . We also denote 
the operator norm of operator V as ||7^|| = sup||^||^^i Il'PMHx- 

Our analysis involves linear spaces as well. For example, I and Supp(L) (similarly dehne Zq for Lq, we will not restate that 
for the following notations) denotes the column support of matrix L. Without confusion, it forms a linear subspace. We use 
O to represent the element support of a matrix, as well as the corresponding linear subspace. The column space of a matrix 
is written as script U, while the row space is written as script V or Row(L). For any space X, X^ stands for the orthogonal 
complement of space X. 

We also discuss some special matrices and spaces in our analysis. For example, {Lo,So) denotes the ground truth. We 
represent (L*, S*) = {Lq + Hx, Sq — Hs) as the optimal solutions of our model, where Hx and Hs guarantee the feasibility 
of the solution. We are especially interested in expanding the objective function at some particular points: For the exact 
recovery of the column support, we focus on (L, S) = {Lq -f VxgHx, Sq — VxqHs)', for the the exact recovery of the column 
support, we focus on {L, S) = {Lq + Vxq'Puo^x, Sq — Vn^^^Vxo'PuoHx)- Another matrix we are interested in is B{S), 
which consists of normalized non-zero columns of S and belongs to the subdifferential of £ 2,1 norm. Similarly, the space 
T = {UX* + YV*,yX,Y € is highly related to the subgradient of the nuclear norm. Namely, the subgradient of 

nuclear norm can be written in closed form as a term in T plus a term in T^. The projection operator to space is denoted 
by Px±, which equals Vi(±V\;±. 

Table |I] summarizes the main notations used in this paper. 


III. Exact Recoverability oe the Model 

Our main results in this paper show that, surprisingly, model (HD is able to exactly recover the range space of Lq and 
identify the column support of Sq with a closed-form regularization parameter, even when only a small number of expansion 
coefficients are measured w.r.t. general basis and a constant fraction of columns are arbitrarily corrupted. Our theorem is as 
follows: 


Theorem 1 (Exact Recoverability Under Bernoulli Sampling). Any solution {L*,S*) to the extended robust MC (HD with 
A = l/Vlogn exactly recovers the column space of Lq and the column support of Sq with a probability at least 1 — cn~^^, 
if the column support Iq of Sq subjects to Ltd. Ber{a), the support ICobs subjects to i.i.d. Ber{pQ), and 


rank{LQ) < 


n(2) 


M(logn(i)) 


3 ’ 


a < Pa 


n(2) 


/xn(logn(i)) 


3 > 


PO ^ Ppi 


(14) 


where c, pr < f Pa < 1. <^nd Pp < \ are all constants independent of each other, and p is the incoherence parameter in 
( fHal i, ( fT^ , and 


Remark 1. According to liSVV . a recovery result under the Bernoulli model with parameter p automatically implies a 
corresponding result for the uniform model with parameter Q{np) at an overwhelming probability. So conditions (HI are 
equivalent to 

Prn(2) ^ ^ £'s^(2) 


rank{LQ) < 


/r(logn(i))3 


p(logn(i))3 


k > Ppn(i)n(2), 


(15) 


where the column support Tq of Sq is uniformly distributed among all sets of cardinality s, the support ICobs A uniformly 
distributed among all sets of cardinality k, and pr, p'g, and p'p are numerical constants. 
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TABLE I 

Summary of main notations used in the paper. 


Notations 

Meanings 

Notations 

Meanings 

I 

Column Support. 

n 

Element Support. 

Xo 

Xq ~ Ber(a). 

L^obs 

Lloba ~ Ber(po)- 

r 

r = Xq- n Q.obs 

n 

n — Xq n Oofjg. 

m, n 

Size of the data matrix M. 

"•( 1 ). ’ 7 ( 2 ) 

= max{m, n}, n(2) = min{m, n}. 

e(n) 

Grows in the same order of n. 

d(n) 

Grows equal to or less than the order of n. 


Tensor product. 

ei 

Vector whose 2th entry is 1 and others are Os. 

Capital 

A matrix. 

1 , 0,1 

The identity matrix, all-zero matrix, and all-one vector. 

M:j or 

The jth column of matrix M. 

Mij 

The entry at the 2th row and jth column of M. 

M* 

Conjugate transpose of matrix M. 

Ml 

Moore-Penrose pseudo-inverse of matrix M. 

\M\ 

Matrix whose {z,j)-th entry is \Mij\. 

II ■ II2 

^2 norm for vector, |b||2 = 

II ■ lU 

Nuclear norm, the sum of singular values. 

II ' llo 

£q norm, number of non-zero entries. 

II ■ l| 2,0 

£2,0 norm, number of nonzero columns. 

■ 1 

£i norm, ||M||i = Y,i,j Wif. 

II ■ lb,! 

£2,1 norm, ||M||2,i = yjj 2 j ll-M'iilh. 

II ■ Ibjoo 

(.2,00 norm, ||M||2,oo = maxj ||M;j||2. 

II ■ ll-F 

Frobenious norm, ||M||f = 

II ’ II 00 

Infinity norm, ||M||oo = maxjj \Mij\. 

IIPII 

(Matrix) operator norm. 

Lo, So 

Ground truth. 

L*,S* 

Optimal solutions, L* = Lq + S* = Sq — Hg. 

L,S 

L = Lo + Pi^Hl, S = So- PxoHs- 

L,S 

L = Lo +'PuoPl, S = So - 'Pn„t,„PuoHL 

U, V 

Left and right singular vectors of L. 

Uq, U, W 

Column space of Lq, L, L*. 

Vo, V, V* 

Row space of Lq, L, L*. 

f 

Space r = {UX* + YV*,'iX,Y e 


Orthogonal complement of the space X. 


V~M = UU*M, P^M = MVV*. 


'Pj'i. XI = 'PQ± V^±M. 

To, T, X* 

Index of outliers of ^o, -S', S*. 

|Xo| 

Outliers number of Sq. 

X e X 

The column support of X is a subset of X. 

B{S) 

BiS) = {H : P^^ (H) = 0; H,, = e X}. 

~ Ber(p) 

Obeys Bernoulli distribution with parameter p. 

A/ (a, b'^) 

Gaussian distribution (mean a and variance b^). 

Row(M) 

Row space of matrix M . 

Supp(M) 

Column support of matrix M. 

ai(M) 

The 2th singular value of matrix M. 

\fM) 

The 2th eigenvalue of matrix M. 

OOij 

General basis. 

n 



A. Comparison to Previous Results 

In the traditional low-rank MC problem, one seeks to complete a low-rank matrix from only a few measurements without 
corruptions. Recently, it has been shown that a constant fraction of the entries are allowed to be missing, even if the rank of 
intrinsic matrix is as high as 0{n/ log^ n). So compared with the result, our bound in Theorem is tight up to a polylog factor. 
Note that the polylog gap comes from the consideration of arbitrary corruptions in our analysis. When a = 0, our theorem 
partially recovers the results of HJ. 

In the traditional low-rank matrix recovery problem, one tries to recover a low-rank matrix, or the range space of matrix, 
from fully observed corrupted data. To this end, our previous work ED demonstrated that a constant fraction of the columns 
can be corrupted, even if the rank of intrinsic matrix is as high as 0{n/ logn). Compared with the result, our bound in 
Theorem is tight up to a polylog factor as well, where the polylog gap comes from the consideration of missing values in 
our analysis. When po = 1, our theorem partially recovers the results of ED- 

Probably the only low-rank model that can simultaneously complete the missing values, recover the ground truth subspace, 
and detect the corrupted samples is robust MC Qol. As a corollary, Chen et al. ifTOll showed that a constant fraction of columns 
and entries can be corrupted and missing, respectively, if the rank of Lq is of order 0(1). Compared with this, though with 
stronger incoherence (ambiguity) conditions, our work extends the working range of robust MC model to the rank of order 
0{n/ log^ n). Moreover, our results consider a set of more general basis, i.e., when ujij = e^ej, our theorem partially recovers 
the results of Qo). 

Wright et al. E9ll produced a certificate of optimality for {Lq, Sq) for the Compressive Principal Component Pursuit, given 
that {Lq, Sq) is the optimal solution for Principal Component Pursuit. There are significant differences between their work and 
ours: 1. Their analysis assumed that certain entries are corrupted by noise, while our paper assumes that some whole columns 
are noisy. In some sense, theoretical analysis on column noise is more difficult than that on Principal Component Pursuit ED- 
The most distinct difference is that we cannot expect our model to exactly recover Lq and Sq. Rather, only the column space 
of Lq and the column support of Sq can be exactly recovered ifTOl . EOl . 2. Wright et al.’s analysis is based on the assumption 
that {Lq,Sq) can be recovered by Principal Component Pursuit, while our analysis is independent of this requirement. 

IV. Complete Proofs of Theorem[D 

Theorem [D shows the exact recoverability of our extended robust MC model w.r.t. general basis. This section is devoted to 
proving this result. 













A. Proof Sketch 

We argue that it is not very straightforward to apply the existing proofs on Robust PCA/Matrix Completion to the case 
of general basis, since these proofs essentially require the observed entries and the outliers to be represented under the same 
basis IS). To resolve the issue, generally speaking, we novelly divide the proof of Theorem [T] into two parts: The exact 
recoverability of column support and the exact recoverability of column space. We are able to attack the two problems 
separately thanks to the idea of expanding the objective function at the well-designed po ints, i.e .. f £. S) fo r the recovery of 


IV-Bl 


and 


IV-Cl 


for details). This 


column support and (L, S) for the recovery of column space, respectively (see Sections 
technique enables us to decouple the randomization of Iq Vtobs, and so construct the dual variables easily by standard tools 
like the least squares and golfing scheme. We notice that our framework is general. It not only can be applied to the proof 
for easier model like Outlier Pursuit ED (though we will sacrifice a small polylog factor for the probability a of outliers), 
but can also hopefully simplify the proof for model with more complicated formulation, e.g., decomposing the data matrix M 
into more than two structural components lf39l . That is roughly the high-level intuition why we can handle the general basis 
and improve over the previous work in this paper. ^ ^ 

Specifically, for the exact recoverability of column support, we expand the objective function at (L, S) to establish our 
first class of dual conditions. Though it is standard to construct dual variables by golfing scheme, many lemmas need to be 
generalized in the standard setting because of the existence of both Iq and Hobs- All the preliminary work is done in Appendix 
[A| When po = 1 or Q = 0, we claim that our lemmas return to the ones in llJTl . ifTOl . thus being more general. The idea 
behind the proofs is to fix Xq first and use the randomized argument for ^lobs to have a one-step result, and then allow Iq to 
be randomized to get our desired lemmas. 

For the exact recoverability of column support, similarly, we expand the objective function at (£, S) to establish our second 
class of dual conditions. We construct the dual variables by the least squares, and prove the correctness of our construction 
by using generalized lemmas as well. To this end, we also utilize the ambiguity condition, which guarantees that the outlier 
matrix cannot be low-rank. This enables us to improve the upper bound for the rankness of the ground truth matrix from 0(1) 
to our 0(n/log^n). 

In summary, our proof proceeds in two parallel lines. The steps are as follows. 

• We prove the exact recoverability of column support: 

- Section IV-B1 proves the correctness of dual^ccmdition, as shown in Lemma In particular, in the proof we focus 
on the subgradient of objective function at {L,S). 

- Section IV-B2 shows the construction of dual variables and Section [IV-B3| proves its correction in Lemma 

• We then prove the exact recoverability of column space: 

- Section IV-Cl proves the correctness of dual condition, as shown in Lemma In particular, in the proof we focus 
on the subgradient of objective function at {L,S). 

- Section IV-C2 shows the construction of dual variables ( |2T| l, and Section IV-C3| proves its correction in Lemma 


B. Exact Recovery of Column Support 

1) Dual Conditions: We first establish dual conditions for the exact recovery of the column support. The following lemma 
shows that once we can construct dual variables satisfying certain conditions (a.k.a. dual conditions), the column support of 
the outliers can be exactly recovered with a high probability by solving our robust MC model •HD- Basically, the proof is 
to find conditions which implies that 0 belongs to the subdifferential of the objective function at the desired low-rank and 
column-sparse solution. 

Lemma 1. Let (L*,S*) = {Lq + Hl,Sq — Hs) be any solution to the extended robust MC 0. L = Lq + Vx„Hl, and 
S = Sq — VxgHs- Assume that \\'Pq±^ < 1/2, and 

UV* + W = X{F + V^X^D), 

where Vf-W = 0, ||1L|| < 1/2, F = 0, ||£'||2,oo < 1/2, and D\\p < 1/4. Then S* exactly recovers the column 

support of Sq, i.e., Hp G Iq. 

Proof: We first recall that the subgradients of nuclear norm and ^ 2,1 norm are as follows: 

d~^\\L\l = {UV* + Q-.Q&f^,\\Q\\ < 1}, 

%||5||2.i = {B(S) FE-.E&I^, ||.g||2.oo < 1}. 

According to Lemma and the feasibility of {L*,S*), = Hp. Let S = Sq — Hs + Fn^^^Vx^Hp = 

Sq — Fq^^^FxqHl G Iq. Thus the pair (L, S) is feasible to problem Q- Then we have 

|| 1^0 + ~ Hs \\ 2,1 

> \\L\U + A||5||2,i + {UV* + Q,FxxHl) - \{B(S) + E,Fn^,^FxxHL). 
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Now adopt Q such that {Q,Vx±Hl) = \\'Pf±'Pi^HL\\* and {E,'Pn^^^'Px±HL) = and note that 

{BiS),Vn^,^Vx^HL) =0.Sowehaye 

\\Lo + Hx\\* + A||5'q — Hs\\2,i 

> ||Z|U + A ||§||24 + \\Vf^Vx^^HL\U + X\\Va^,^Vx^^Hxh,i + {UV*,Vx^^Hl). 

Notice that 

\{UV*,Vx^.Hx)\ 

= \{W-XF-XVn^^D,Vx^^Hx}\ 

< IW'Pf^'Px^^Hxh + ^W'Pn^^.'Px^^Hxh.i + jWVn^^Vx^^HxWF. 

So we have 

\\Lo + -ffill* + All^g — Hs\\2,i 

> \\L\U + X\\S\\2s + l\\Vx^Vx^^Hx\\. + ^\\Vn.,^^^^ - ^\\Vn^^Vx,^HL\\F. 

Also, note that 

W'Pn^.Ex^^HxWF 

< W'Pn^.Ef^'Px^^HxWF + WVa^^VfVx^.HxWF 

< \\'Pf.'Px^^Hx\\F + l\\Vx,^Hx\\F 

< Hx\\f+1 Hx\\f+1 ll^n-II x• 

That is 

rn-^i-^xllx < 2\\Vf^Vx^Hx\\F + WVn.^^Vx^^HxWF- 

Therefore, we have 

11^0 + Hx\\* + AIIS'q — Hs\\2,i 

> ||Z|U + A|| 5 || 2 ,i + l^WVx^Vx^^HxlU + IwVn^^.Vx^^Hxhi. 

Since the pair (Lg + Hx, Sq — Hs) is optimal to problem ([TT]i, we have 


Ef^Vx^Hx = 0 and Vq^^^Vx^Hx = 0, 


i.e., Vx±Hx e r n flZhs = {0}- So Hx G To- 


(16) 


By Lemma to prove the exact recovery of column support, it suffices to show a dual certificate W such that 

'(a) W^f^, 

(b) Ill^||<y^2, _ 

(c) WV^^JUV*+ \xn\\F<X/A, 

,(d) l|lPnr.((AlA* + M^)||2,oo < A/2. 

2) Certification by Golfing Scheme: The remainder of the proofs is to construct W which satisfies dual conditions ( [T6] l. 
Before introducing our construction, we assume that ICots ~ Ber(Po) (For brevity, we also write it as Clots ^ Ber(Po))^ or 
equivalently (l^bs Ber(l — po). Note that Clots has the same distribution as that of fli U fl 2 U ... U Clj^, where each Clj is 
drawn from Ber(( 7 ), jg = [logn(i)], and q fulfills 1 —pg = (1 — (Note that q = 0(l/logn(i)) implies pg = 0(1)). We 

construct W based on such a distribution. _ _ 

To construct W, we use the golfing scheme introduced by lO and llJTl . Let Zj-i = Vf{UV* — Tj-i). We construct W 
by an inductive procedure: 


^j—Xj-i+q ^VQjZj-i—q ^ Vcii^Zk-i, 


k=l 


Also, we have the inductive equation: 


w = rf.Y,,. 


~ ~ 9 ^ExVa-Zj-i- 


r' 


(17) 


(18) 


^By the duality between the nuclear norm and the operator norm, there exists a Q such that {Q,Vj'±V-j-±H} = \\Vy±Vj-± and ||(5|| < 1. Thus 
we take Q = Vj^j^Q G 'T'^. It holds similarly for E. 
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3) Proofs of Dual Conditions: We now prove that the dual variables satisfy our dual conditions. The proof basically uses 
the recursiveness of the dual variables that we construct. 

Lemma 2. Assume that Ctobs ~ Ber{po) and jo = [logn]. Then under the other assumptions of Theorem^ given by 
0 obeys the dual conditions ([T6|. 

Proof: By Lemma [TT| Lemma 12 and the inductive equation ( fTS] ), when q > c'p,r log n(i)/e^n( 2 ) for some c', the following 
inequalities hold with an overwhelming probability: 

\\Zf\F<e^\\Zo\\F = e^\UV*\\p, 

max|(Zj,a;ab)| < max |(Zq, Wab)| = e-’max |([/!/*, Wab)!- 

ab ah ab 

Now we check the three conditions in ( [Thl l. 

(a) The construction ( [T7| l implies the condition (a) holds. 

(b) It holds that 


30 


k^l 

30 

k^l 

30 

- E - Zk-i\\ 

I 

/n(i) log 71(1) ,, 

max|(Zfc_i,a;ab)| 


fc=l 


<Co 

<Co 

= Co 
1 

“ 4 ’ 


1 /n (i) logrt(i) fJfF 

q V fnn 


1 — £ 
1 


Pr 


1 - ey <?(logn(i))2 


where the third inequality holds due to Lemma fol and the last inequality holds once q > 0(l/logn(i)). 

(c) Notice that Yj^ G Clobs, “ O^Then the following inequalities follow 

WVnjJUV*+ W)\\f = WVa^JUV* + rf^Y,J\F 

= \\'PajJUV*+Y,,-VfY,^)\\F 
= \\'PnjJUV*-rfY,,)\\F 

< Oo = riogn(i)l > logn(i)) 

(£<e-‘) 

n(i) 


( 19 ) 


< 


< 




vTT^(logn(i)) 3/2 

A 


(d) We hrst note that UV* + W = Zjg + Yjg. It follows from ( [T^ that 


\\'PnobsZjoh,oo < W'PfiabsZjoWF < e^°y/r < 












II 


Moreover, we have 


rO„.,>Soll2.oo = ||r,„||2,oo 

jo 

^ y^g~^||^afc^A;-l||2,oo 

k=l 

jo 

< q~'^y/rny^ max\{Zk-i,u!ab)\ 

ab 




< clogn(i) 

A 


where the fifth inequality holds once q > 0(l/logn(i)). Thus ||7^noba+ lT^)|| 2 ,oo 


< A/4. 


C. Exact Recovery of Column Space 

1) Dual Conditions: We then establish dual conditions for the exact recovery of the column space. The following lemma 
shows that if we can construct dual variables satisfying certain conditions, the column space of the underlying matrix can be 
exactly recovered with a high probability by solving model ( |TT] i. 

Lemma 3 (Dual Conditions for Exact Column Space). Let {L*, S*) = {Lq + Hl, Sq — Hs) be any solution to the extended 
robust MC ( pTI , L = Lq + VuoHl, and S = Sq — Vn^bfPuoHL- Suppose that V n T-*- = {0} and 

W = X{B{S) + F), 

where W € V"’“ H flobs, ||W^|| ^ 1/2, V^xF = 0, and ||T"|| 2 ,oo < 1/2. Then L* exactly recovers the column support of Lq, 
i.e., ^ ILq. 

Proof: We first recall that the subgradients of nuclear norm and £ 2,1 norm are as follows; 

dz\\L\U={UV*+Q-.Q&f^,\\Q\\ < 1}, 
d^\\Sh,i = {B{S) FE-.E&X^, ||.g||2.oo < 1}. 

By the definition of subgradient, the inequality follows 

||l^o + Hl\\^ + A||S'g — Hs\\2,i 

> ||T|U + A||5||2,i + {UV* + Q,VuxHl) - \{B{S) + E,Vn„,.VuxHL) 

> \\L\U + All^lb,! + {UVfVuxHL) + {Q,VufHL) - \{B{S),VuxHl) - \{E,rn„,.VufHL). 

Now adopt Q such that {Q^Vu^Hl) = \\Vz;xVuxHl\U and {E^Vn^.^VyxHL) = -\\VtVu^Hl\\ 2 ,^ We have 

||l^o + FIlW^ + A||5'q — Hs\\2,i 
> ||L|U + A||§||2,i + \\V^xVuxHl\U + M\VrVu^xHL\\2,i 
-\{B(S),VuxHl). 

Notice that 

\{-\B{SIVuxHl)\ = \{\F-W,VuxHl)\ 

<\{W,VuxHl)\+X\{F,VuxHl)\ 

< \\\'PZ;xVuxHl\U + \\\VtVuxHl\\2,1. 


^By the duality between the nuclear norm and the operator norm, there exists a Q such that {Q,'P^±'P^±H) = and ||Q|| < 1 . Thus 

we take Q = 'Pjj±'P<c,j_Q E 'T^. It holds similarly for E. 

Uq 1/ 
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Hence 

\\L\\* + M\Sh,i 

> \\Lo + Hl\\^ + AIIS'q — Hs\\2,i 

> \\L\U + A ||§||24 + + ^\\VrVu,^HLh,i. 

So Vu^Hl e V n r-L = {O}, i.e., Hl G Uq. 

m 

The following lemma shows that one of the conditions in Lemma holds true. 

Lemma 4. Under the assumption of Theorem 0 V n r-L = {0}. 

Proof: We first prove p{l — po)\\V:^Vr±M\\p < 2\\V^±Vr±M\\p for any matrix M. Let M' = Vp±M. Because 
VrV^M' PVrV^^M' = 0, we have \\VrV^M'\\p = \\VtV^^M'\\p < \\V^^M'\\p. Note that 

{p{l-Pt,))-^\\VrV^M'\\p 
= {p{1-po))-HVtV^M\VtV^M') 

= {V^M', (p(l - p:,))-^V^VrV^M') 

= {{p{l - p,))-^V^Vrr^ - V^)V^M') 

+ {r^M\r^M') 

> \\r^M'\\p-^\\r^M'\\p 
= l\\V^M'\\p, 

where the first inequality holds due to Corollary]^ So we have 

WV^.M'Wp > WrrVgM'Wp > ^^^~/°^ \\V^M'\\p, 
i.e., p(l — po)\\V^Vt^M\\p < 2\\V^±Vr^M\\p. 

Now let M G V n r-L. Then Vg^Vr^M = 0 while V^Vr^M = M. So p(l - Po)||AL||f < 0, i.e., M = 0. Therefore, 

vnr-L = {0}. _ ■ 

By Lemma to prove the exact recovery of column space, it suffices to show a dual certificate W such that 

fwev^nnobs, 

(20) 

]VnW = XB{S), U=Ionnobs, 

l|j^rW^||2,oo < a/2 , 

2 ) Certification by Least Squares: The remainder of proofs is to construct W which satisfies the dual conditions ( |20| ). Note 
that X = Zq ~ Ber(p). To construct W, we consider the method of least squares, which is 

^ Xn)"S(5), (21) 

k>0 

where the Neumann series is well defined due to \\Vu'P^_^q± 7^n|| < 1- Indeed, note that 11 C ^lobs- So we have the identity: 

= 'Pni'P:^ + + ■■■)P'n 

= PnP‘i){Py + PvPo.^^Pv P ■■■)PvPn 
= PuP^iPy - PvPnfPv)~"PvP^ 

= PnPs){PvPn^,,Pv)-^PvPn. 

By Lemma pT| and the triangle inequality, we have that 1 — (1 — Po)~^\\P:fPQ.ahsPv\\ < 1/2^ i-6-, \\{P\;PQ.absPv)~^\\ ^ 
2/(1 — po)- Therefore, 

WPuP^pn^J^ = WPuPgpn^^PuW 

<2{1-por^r^Vuf 

< 2(1 -po)"V^ 

< 1, 


(22) 
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where the second inequality holds due to Corollary]^ Note that V^iW = \B{S) and W G H flobs- So to prove the dual 
conditions (|20ll, it suffices to show that 


(23) 


|(a) ||W^||<l/2 , 

\(b) \\VrW\\2,o. < X/2. 

3) Proofs of Dual Conditions: We now prove that the dual variables that we construct above satisfy our dual conditions. 
Lemma 5. Under the assumptions of Theorem W given by obeys dual conditions 
Proof: Let PL = I]fc>i(^n7^v , VnY- Then 


= xrc 


k>0 


(24) 


Now we check the two conditions in ( |2?| l. 

(a) By the assumption, we have ||S(S')|| < p!. Thus the first term in ( |24l i obeys 

1 


A 






B{S) <A B{S) 


< 


(25) 


For the second term, we have 


Then according to ( |2^ which states that ||7^y_|_Qi T’niP < 2cr^/(l — pf) = ctq with high probability. 


m<Y.of = - 


— 


< 1 . 


k>l 


So 


That is 




lIVLil < 2- 


(b) Let g stand for G = ■ Then W = XVg^^^^^^g(B(S)). Notice that g(B(S)) G Iq- Thus 

VrW = XV^^V^^^^^J{B{S)) 

= XVi^GiBiS)) - XV^^ G{B{S)) 

= -XV^^V^^^^J{B(S)). 

Now denote Q = "Py^Qj. G{B{S)). Note that 


= T.Ql = E 






^ 3 o 




30 i 


— E/ ^T’nPy_|_f2i^ 




<11^11 


VuV. 


v+n;! 


< 4 , Vj, 
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where im = argmaxi \e-*GVQ'P^^Q± (eiepe^^l, Gj is a unitary matrix, and the second inequality holds because of fact ( |22l l. 
Thus llT^rW^lb.oo = M\'T^i^Q\\ 2 ,ao < ^IIQIb.oo < A/2. The proofs are completed. ■ 

V. Algorithm 

It is well known that robust MC can be efficiently solved by Alternating Direction Method of Multipliers (ADMM) ll40l . 
which is probably the most widely used method for solving nuclear norm minimization problems. In this section, we develop 
a faster algorithm, termed £ 2,1 filtering algorithm, to solve the same problem. 


A. £ 2,1 Filtering Algorithm 

Briefly speaking, our £ 2,1 Altering algorithm consists of two steps: recovering the ground truth subspace from a randomly 
selected sub-column matrix, and then processing the remaining columns via £21 norm based linear regression, which turns out 
to be a least square problem. 

1) Recovering Subspace from a Seed Matrix: To speed up the algorithm, our strategy is to focus on a small-scaled subproblem 
from which we can recover the same subspace as solving the whole original problem ED. To this end, we partition the whole 
matrix into two blocks. Suppose that r = rank(L) <C min{m,n}. We randomly sample k columns from M by i.i.d. Ber((i/n) 
(our Theorem suggests choosing d as 0(rlog^n)), forming a submatrix Mi (for brevity, we assume that Mi is the leftmost 
submatrix of M). Then we can partition M, L, and S accordingly: 


M=[Mi,Mr.], S=[Sl,Srl L=[Li,Lr]. 

To recover the desired subspace Range(Lo) from Mi, we solve a small-scaled problem: 


mm 

Li,Si 


\Li\\i, + 


1 


Vlogfc 


\\Sih,i, s.t. n'{Mi) = n'iLi + Si)e 


ixk 


(26) 


where is a linear mapping restricting TZ{-) on the column index of M/. As we will show in Section V-C when the 

Bernoulli parameter d is no less than a lower bound, problem ( |26| l exactly recovers the correct subspace Range (Lq) and the 
column support of [Sq]; with an overwhelming probability. 

2) £ 2,1 Filtering Step: Since Range(Li) = Range(Lo) at an overwhelming probability, each column of can be represented 
as the linear combinations of Li. Namely, there exists a representation matrix Q S such that 


Ft — FiQ. 


Note that the part Sr should have very sparse columns, so we use the following £21 norm based linear regression problem to 
explore the column supports of Sr'. 


mm||S'^|| 2 ,i, 


s.t. 'R.'{Mr) = TZ'{LiQ + Sr). 


(27) 


If we solve problem ( [Z7| ) directly by using ADMM the complexity of our algorithm will be nearly the same as that 
of solving the whole original problem. Fortunately, we can solve ( |27] l column-wise due to the separability of £ 2,1 norms. Let 
Mr^\ and Sr '^ represent the ith column of Mr, Q, and Sr, respectively (i = — sr). Then problem (|Z7]i could be 


decomposed into n — k subproblems: 


Equivalently, 


mm 


min \\2l{Sl^ 


S.t. TZl{Mr)^^'> =TZ'i{Liq + Sr)^"'> eR"', i = l,...,n-k. 


s.t. i = l,...,n-k, 


(28) 


(29) 


where Z' is an operator functioning on a vector which wipes out the unobserved elements, 3^' is a matrix operator which wipes 
out the corresponding rows of a matrix, and hi is the number of observed elements in the ith column. As least square problems, 
( |28l l admits closed-form solutions g6) = Z[{Sr^) = Z[{Mr''^) - z = 1, k. 

If f 0, we infer that the column is corrupted by noises. 

We summarize our £ 2,1 Altering algorithm in Algorithm [T] 


B. Target Rank Estimation 

As we mentioned above, our algorithm requires the rank estimation r as an input. For some specific applications, e.g., 
background modeling llJTl and photometric stereo B3]| . the rank of the underlying matrix is known to us due to their physical 
properties. However, it is not always clear how to estimate the rank for some other cases. Here we provide a heuristic strategy 
for rank estimation. 

Our strategy is based on the multiple trials of solving subproblem ( |26| l. Namely, starting from a small r estimation, we 
solve subproblem ( |2^ by subsampling. If the optimal solution L* is such that fc/rank(L/) > 0(log^n), we accept the r and 
output; Otherwise, we increase r by a fixed step (and so increase k) and repeat the procedure until kin > 0.5. We require 
k/n < 0.5 because the speed advantage of our £ 2,1 algorithm vanishes if the low-rank assumption does not hold. 
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Algorithm 1 £ 2,1 Filtering Algorithm for Exact Recovery of Subspace and Support 


Input: Observed data matrix TZ{M) and estimated rank r (see Section V-Bl. 

1. Randomly sample columns from Tl{M) G by Ber(d/n) to form TZ{Mi) € 

2. // Line 3 recovers the subspace from a seed matrix. 

3. Solve small-scaled m x k problem ( |26] l by ADMM and obtain Li, Range^Lo), and column support of Sf, 

4. For i from 1 to n — k 


5. 

6 . 

7. 

8 . 
9. 
10 


Conduct QR factorization on the matrix yl{Li) as yl{Li) = QiRf, 
11 Line 7 implements £2 1 filtering to the remaining columns. 

■ ■ ’ (i) 


by solving which is 0 - Q^iQ*y'iMy’)); 


Recover Zl{Sr^) G 

If 0 

Output is an outlier”; 

End If 
11. End For 

Output: Low-dimensional subspace Range (Lg) column support of matrix Sq. 


C. Theoretical Guarantees 


In this section, we establish theoretical guarantees for our ^ 2,1 filtering algorithm. Namely, Algorithm [T] is able to exactly 
recover the range space of Lq and the column support of Sq with a high probability. To this end, we show that the two steps 
in Section V-A succeed at overwhelming probabilities, respectively; 


• To guarantee the exact recovery of Range(Lo) from the seed matrix, we prove that the sampled columns in Line [T] exactly 
span the desired subspace Range(Lo) when the columns are restricted to the set i.e., Range(7^2;iM/)=Range(Lo) (see 
Theorem]^; Otherwise, only a subspace of Range(Lo) can be recovered by Line|^ Applying Theoremwe justify that 
Line recovers the ground truth subspace from the seed matrix with an overwhelming probability (see Theorem 0. 

• Lor ^ 2,1 filtering step, we demonstrate that, though operator y'^ randomly wipes out several rows of L/, the columns of 
yl{Li) exactly span Range(3^'(Lo)) with an overwhelming probability. So by checking whether the ith column belongs 
to Range(3^'(Lo)). the least squares problem ( |2^ suffices to examine whether a specific column is an outlier (see 
Theorem |^. 


1) Analysis for Recovering Subspace from a Seed Matrix: To guarantee the recovery of Range (Lg) by Line[^ the sampled 
columns in Line JTI should be informative. In other words. Range (Lg) = Range To select the smallest number of 

columns in Line Iff we estimate the lower bound for the Bernoulli parameter d/n. Intuitively, this problem is highly connected 
to the property of Vx^ M- For instance, suppose that in the worst case Vx^ M is a matrix whose elements in the first column 
are ones while all other elements equal zeros. By this time. Line will select the first column (the only complete basis) at 
a high probability if and only if d = n. But for Vx^M whose elements are all equal to ones, a much smaller d suffices to 
guarantee the success of sampling. Thus, to identify the two cases, we involve the incoherence in our analysis. 

We now estimate the smallest Bernoulli parameter d in Line[^which ensures that Range(Lg) C Range(Mi), or equivalently 
Range(Lg) = Rangeat an overwhelming probability. The following theorem illustrates the result: 


Theorem 2 (Sampling a Set of Complete Basis by Line[2l. Suppose that each column of the incoherent Lq is sampled i.i.d. 
by Bernoulli distribution with parameter d/n. Let [Lf\i be the selected columns from Lq, i.e., [Lq]i — Sj[LQ].je*, where 
Sj ^ Ber{d/n). Then with probability at least 1 — 5, we have Range{[LQ\i) = Range{LQ), provided that d > 2/rrlog where 
p, is the incoherence parameter on the row space of matrix Lq. 


Proof: The proof of Theorem]^ can be found in the Appendices. ■ 

Remark 2. Note that a large incoherence parameter on the row space implies that slightly perturbing Lq tremendously changes 
its column space. So we will need more samples in order to capture enough information about the column space of Lq. 


To guarantee the exact recovery of desired subspace from the seed matrix, the rank r of intrinsic matrix should be low 
enough compared with the input size (see Theorem [^. Note that Line[^ however, selects the columns by i.i.d. Ber((i/n), so 
that the number k of sampled columns is a random variable. Roughly, k should be around d due to the fact E(fc) = d. The 
following lemma implies that the magnitude of k typically has the same order as that of parameter d with an overwhelming 
probability. 

Lemma 6. Let n be the number of Bernoulli trials and suppose that LI ~ Ber{d/n). Then with an overwhelming probability, 
|n| = 0(d), provided that d > clogn for a numerical constant c. 

Proof: Take a perturbation e such that d/n = m/n + e. By scalar Chernoff bound which states that 

P(|fl| < m) < 
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if taking m = d/2, e = d/2n and d> ci logn for an appropriate constant ci, we have 

F{\n\ < d/2) < < n-i°. (30) 

In the other direction, by scalar Chernoff bound again which states that 

F{\n\ >m)< 

if taking m = 2d, e = —d/n and d> C 2 logn for an appropriate constant C 2 , we obtain 

F{\n\>2d)<e-‘^/^<n-^°. (31) 

Finally, according to ( [30l l and iB, we conclude that d/2 < |fl| < 2d with an overwhelming probability, provided that 
d> c logn for some constant c. ■ 

By Theorems [T] and and Lemma the following theorem justifies the success of Line in Algorithm 

Theorem 3 (Exact Recovery of Ground Truth Subspace from Seed Matrix). Suppose that all the conditions in Theorem are 
fulfilled for the pair ([Lg]/, [.So];)- Then Line^of Algorithm^exactly recovers the column space of the incoherent Lq and 
the column support of [S'g]; with an overwhelming probability 1 — cn~^^, provided that d > CoP-rlog^ n, where c and Cq are 
numerical constants, and fi is the incoherence parameter on the row space of matrix Lq. 

2) Analysis for £ 2,1 Filtering: To justify the outlier identifiability of model ( |29| l, it suffices to show that Range(3^'(L/)) is 
complete, i.e.. Range(3^'(L;)) = Range(3^'(Lo))- Actually, this can be proved by the following theorem: 

Theorem 4 (Outlier Identifiability of £ 2,1 Filtering). Suppose that each row of Li is sampled i.i.d. by Bernoulli distribution 
with parameter pq. Let y[{Li) be the selected rows from matrix Li, i.e., y'fiLi) = where Sj ^ Ber{pf). Then 

with probability at least 1 — 5, we have rank{y[{Li)) = r, or equivalently Range{y[{Li)) = Range{/y[{Lf)), provided that 
£*0^2/6^ log where pL is the incoherence parameter on the column space of matrix Lq. 

Proof: The proof is similar to that of Theorem]^ where we use a property that ii{Li) = p{Lq) since Range(L;) = 
Range(Lg), by Theorem]^ ■ 

It is worth noting that, when the matrix is fully observed, model ( |29| ) exactly identifies the outliers even without Theorem 

El 

D. Complexity Analysis 

In this section, we consider the time complexity of our randomized £ 2,1 filtering algorithm. We analyze our algorithm in 
the case where d = 0(rlog^n). In AlgorithmLinerequires 0{n) time. For Linewhich recovers m x cr(logn)^ seed 
matrix, this step requires 0(r^m log® n) time. Line requires at most Br^mlog^n time due to the QR factorization and 
Line 1^ needs (2r + l)m time due to matrix-matrix multiplication. Thus the overall complexity of our £ 2,1 filtering algorithm 
is at most 0(r^m log® n) + (2r + l)mn + Qr^mn log® n « Qr^mn log® n. As ADMM algorithm requires 0{mn min{m, n}) 
time to run for our model due to SVD or matrix-matrix multiplication in every iteration, and require many iterations in order 
to converge, our algorithm is significantly faster than the state-of-the-art methods. 

VI. Applications and Experiments 

As we discuss in Section our model and algorithm have various applications. To show that, this section first relates our 
model to the subspace clustering task with missing values, and then demonstrates the validity of our theory and applications 
by synthetic and real experiments. 

A. Applications to Robust Subspace Clustering with Missing Values 

Subspace clustering aims at clustering data according to the subspaces they lie in. It is well known that many datasets, 
e.g., face m and motion El, a, Q, can be well separated by their different subspaces. So subspace clustering has been 
successfully applied to face recognition m, motion segmentation ll45l . etc. 

Probably one of the most effective subspace clustering models is robust LRR m, na. Suppose that the data matrix M 
contains columns that are from a union of independent subspaces with outliers. The idea of robust LRR is to self-express the 
data, namely, using the clean data themselves as the dictionary, and then find the representation matrix with the lowest rank. 
Mathematically, it is formulated as 

min ||Z|U-f A||5'||2 .i, s.t. L = LZ, M = L + S. (32) 

Z ,L,S 

After obtaining the optimal solution Z*, we can apply spectral clustering algorithms, such as Normalized Cut, to cluster each 
data point according to the subspaces they lie in. 
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Although robust LRR < [3^ has been widely applied to many computer vision tasks m, El, it cannot handle missing 
values, i.e., only a few entries of M are observed. Such a situation commonly occurs because of sensor failures, uncontrolled 
environments, etc. To resolve the issue, in this paper we extend robust LRR by slightly modifying the second constraint: 

min ||Z||* + A||S'|| 2 ,i, s.t. L = LZ, = {L + S,ujij), {i,j)GlCobs- (33) 

Z,L,S 


A similar model has been proposed by Shi et al. 


min II.Z’IL 

Z,D,S 


which is 

D = DZ + S, {M,e,e*) = {D,e,e*), ii,j)eJCobs- 


(34) 


However, there are two main differences between their model and ours: 1. Our model does not require ujij to be standard 
basis, thus being more general; 2. Unlike 0, we use clean data as the dictionary to represent themselves. Such a modification 
robustifies the model significantly, as discussed in El, El- 

The extended robust LRR ( |33| ) is NP-hard due to its non-convexity, which incurs great difficulty in efficient solution. As an 
application of this paper, we show that the solutions to ( |33] l and to ([8]) are mutually expressible in closed forms: 

Claim 1. The pair {L*{L*^, L*, S*) is optimal to the extended robust LRR problem ( |331 l, if {L*,S*) is a solution to the 
extended robust MC problem •S'- Conversely, suppose that {Z*, L*, S*) is a solution to the extended robust LRR problem ( |33| l, 
then (L*, S*) will be optimal to the extended robust MC problem •S'- 

Proof: The proof can be found in the Appendices. ■ 

Using relaxed form ( [TOl l to well approximate original problem ([^ according to Theorem and then applying Claim [T] to 
obtain a solution to the extended robust LRR problem model ( |33] ), we are able to robustly cluster subspaces even though a 
constant fraction of values are unobserved. This is true once the conditions in Theorem [T] can be satisfied: 

• The low-rankness condition holds if the sum of the subspaces is low-dimensional (less than 0(n/log^n)); 

• The incoherence condition holds if the number of the subspaces is not too large (less than an absolute constant) ||49|. 
The computational cost can be further cut by applying our £ 2,1 filtering approach, i.e.. Algorithm 

Remark 3. Intuitively, Claim is equivalent to a two-step procedure: First completing the data matrix and identifying the 
outlier by extended robust MC, and then clustering the data by LRR. 


B. Simulations and Experiments 

In this section, we conduct a series of experiments to demonstrate the validity of our theorems, and show possible applications 
of our model and algorithm. 

1) Validity of Regularization Parameter: We first verify the validity of our regularization parameter A = 1/Vlogn by 
simulations. The toy data are designed as follows. We compute Lq = XY^ as a product of two n x r i.i.d. Af(0,1) matrices. 
The non-zero columns of Sq are sampled by Bernoulli distribution with parameter a, whose entries obey i.i.d. JV{0, 1). Finally, 
we construct our observation matrix as 'Pq^^^{Lo Sq), where Vtobs is the observed index selected by i.i.d. Ber(po)- We solve 
model ( [TT] ) to obtain an optimal solution (L*, S*), and then compare it with {Lq, Sq). The distance between the range spaces 
of L* and Lq is defined by \\Vu- — VugWF and the distance between the column supports of S* and Sq is given by the 
Hamming distance. The experiment is run by 10 times and we report the average outputs. Table [n| illustrates that our choice 
of the regularization parameter enables model ( [TT] l to exactly recover the range space of Lq and the column support of Sq at 
a high probability. 


TABLE II 

Exact recovery on problems with dieeerent sizes. Here rank(Lo) = 0.05n, a = 0.1, po = 0.8, and A = 1/Vlogn. 


n 

dist(Range(L* ),Range(Lo)) 

dist(X*,Xo) 

too 

1.09 X 10-^"^ 

0 

200 

1.70 X 10-^"‘ 

0 

500 

4.02 X 10-i'‘ 

0 

1,000 

6.10 X 10“^^ 

0 


Theorem [T] shows that the exact recoverability of model ( [TT| l is independent of the magnitudes of noises. To verify this. 
Table [nl| records the differences between the ground truth {Lq, Sq) and the output {L*, S*) of model 0 under varying noise 
magnitudes ^(0,1/n), ^(0,1), and ^(0, n). It seems that our model always succeeds, no matter what magnitudes the noises 


are. 
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TABLE III 

Exact recovery on problems with different noise magnitudes. Here n = 200, rank(Lo) = 0.05n, a = 0.1, po = 0.8, and A = 1/Vlogn. 


Magnitude 

dist(Range(L * ),Range(Lo)) 

dist(X*, Xq) 

W(0, l/n) 

1.98 X 10-^-^ 

0 

^(0,1) 

1.50 X 

0 

Af(0, n) 

3.20 X 10-i'‘ 

0 


#Observation=0.95n^ #Observation=0.8n^ 



0.05 0.1 0.15 0.2 0.25 


Rank/n 



#Observation=0.5n^ 



0.05 0.1 0.15 0.2 0.25 


Rank/n 



Fig. 2. Exact recovery of the extended robust MC on random problems of varying sizes. The white region represents the exact recovery in 10 experiments, 
and black region denotes the failures in all of the experiments. 


2 ) Exact Recovery from Varying Fractions of Corruptions and Observations: We then test the exact recoverability of our 
model under varying fractions of corruptions and observations. The data are generated as the above-mentioned experiments, 
where the data size n = 200. We repeat the experiments by decreasing the number of observations. Each simulation is run by 
10 times, and Figure plots the fraction of correct recoveries: white region represents the exact recovery in 10 experiments, 
and black region denotes the failures in all of the experiments. It seems that model o succeeds even when the rank of 
intrinsic matrix is comparable to 0{n), which is consistent with our forecasted order 0(n/log^n). But with the decreasing 
number of observations, the working range of model Q shrinks. 

3) Speed Advantage o/f' 2,1 Filtering Algorithm: To test the speed advantage of our £ 2,1 filtering algorithm, we compare the 
running time of ADMM and our filtering Algorithm [T] on the synthetic data. The data are generated as the above-mentioned 
simulations, where we change one variable among the set {n,r,po,a) each time and fix others. Table IV lists the CPU times, 
the distance between Range(L*) and Range(Lo), and the Hamming distance between X* and Xq by the two algorithms. It is 
easy to see that our £ 2.1 filtering approach is significantly faster than ADMM under a comparable precision. 


TABLE IV 

Comparison of the speed between ADMM and our ^2,1 filtering algorithm under varying parameter settings. 


Parameter ( n , r , pQ , a ) 

Method 

Time (s) 

dist(Range(L* ),Range(Lo)) 

dist(X*,Xo) 

(1,000, 1, 0.95, 0.1) 

ADMM 
£ 2,1 Filtering 

102.89 

5.86 

2.34 X 10“^ 

5.41 X 10“® 

0 

0 

(2,000, 1, 0.95, 0.1) 

ADMM 

587.53 

6.03 X 10“^ 

0 

£ 2,1 Filtering 

35.49 

9.91 X 10“® 

0 

(1,000, 10, 0.95, 0.1) 

ADMM 

104.12 

2.71 X 10““ 

0 

£ 2,1 Filtering 

25.75 

4.06 X 10“® 

0 

(1,000, 1, 0.8, 0.1) 

ADMM 

100.10 

2.76 X 10-y 

0 

£ 2,1 Filtering 

4.33 

6.17 X 10“® 

0 

(1,000, 1, 0.95, 0.2) 

ADMM 

92.95 

4.01 X 10“® 

0 

£ 2,1 Filtering 

5.09 

8.34 X 10“® 

0 
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TABLE V 

Clustering accuracies oe our algorithm on the first 5 sequences in Hopkins 155 database, where there are 5% missing entries. 


#Sequence 

Motion Number 

MC+SSC 

MC+LRR 

Robust MC+SSC 

Robust MC+LRR (Ours) 

#1 

2 

60.48% 

80.00% 

82.38% 

98 . 10 % 

#2 

3 

67.43% 

79.46% 

73.24% 

91 . 91 % 

#3 

2 

75.16% 

98.00% 

83.01% 

98 . 04 % 

#4 

2 

73.09% 

67.55% 

86.54% 

99 . 21 % 

#5 

2 

75.69% 

82.41% 

75.69% 

92 . 59 % 


4) Applications to Subspace Clustering with Missing Coefficients: To apply our model to the subspace clustering tasks with 
a fraction of missing values, we conduct experiments on the real Hopkins 155 databas^ The Hopkins 155 database consists 
of 155 sequences, each of which contains multiple key points drawn from two or three motion objects. Because the motion 
trajectory of each rigid body lies in a single subspace, so we are able to cluster the points according to the subspaces they lie 
in. To make the problem more challenging, we randomly corrupt 5% columns and remove 5% observed coefficients. Table [V] 
lists the clustering accuracies of our algorithm on the hrst 5 sequences in comparison with other approaches M- We can see 
that our approach always achieves high clustering accuracy, even though we cannot observe all of the data values. In addition, 
the experiments show that the Robust MC based methods are better than MC based methods. So our model is more robust. 

VII. Conclusions 

In this paper, we investigate the theory, the algorithm, and the applications of our extended robust MC model. In particular, 
we study the exact recoverability of our model from few observed coefficients w.r.t. general basis, which partially covers the 
existing results as special cases. With slightly stronger incoherence (ambiguity) conditions, we are able to push the upper bound 
on the allowed rank from 0(1) to 0(n/log^n), even when there are around a constant fraction of unobserved coefficients 
and column corruptions, where n is the sample size. We further suggest a universal choice of the regularization parameter, 
which is A = l/>/logn. This result waives the necessity of tuning regularization parameter, so it signihcantly extends the 
working range of robust MC. Moreover, we propose £ 2,1 hltering algorithm so as to speed up solving our model numerically, 
and establish corresponding theoretical guarantees. As an application, we also relate our model to the subspace clustering tasks 
with missing values so that our theory and algorithm can be immediately applied to the subspace segmentation problem. Our 
experiments on the synthetic and real data testify to our theories. 
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Appendix 

A. Preliminary Lemmas 

We present several preliminary lemmas here which are critical for our proofs. For those readers who are interested in the 
main body of the proofs, please refer to Sections IV-B and IV-C directly. 

Lemma 7. The optimal solution {L*,S*) to the extended robust MC satisfies S* G Llobs- 




21 


Proof: Suppose that S* ^ ^obs- We have ||L*||* + ||7^noba<S'*||2,i < ||^*IU + l|5'*||2,i- Also, notice that the pair 
{L*,Vq^^^S*) is feasible to problem O- Thus we have a contradiction to the optimality of (L*, S*). ■ 

Lemma 8 (Elimination Lemma on Observed Elements). Suppose that any solution (L*, S*) to the extended robust MC ([n]i 
with observation set ICobs exactly recovers the column space of Lq and the column support of Sq, i.e., Range{L*) = Range{Lo) 
and {j : S*j ^ Range{L*)} = Iq. Then any solution {L'*, S'*) to 0 with observation set succeeds as well, where 

l^obs Q ^obs- 

Proof: The conclusion holds because the constraints in problem 0 with observation set are stronger than the 
constraints in problem 0 with observation set JCobs- ■ 

Lemma 9 (Elimination Lemma on Column Support). Suppose that any solution {L*, S*) to the extended robust MC 0 with 
input TZ{M) = TZ{L*) + TZ(S*) exactly recovers the column space of Lq and the column support of Sq, i.e., Range{L*) = 
Range{Lif) and {j : S'* ^ Range{L*)} = Iq. Then any solution (L'*,S'*) to 0 with input TZ{M') = TZ{L*) + TZVx{S*) 
succeeds as well, where T CT* = Zq. 

Proof: Since {L'*,S'*) is the solution of ( fTT] i with input matrix we have 

||L'*|U + A||S'*||2,i < \\L*\U + A||iPiS*||2,i. 

Therefore 

|1L'*|U +Alls'*+Zx^nio ^112.1 

< ||L'*||* + A||S'*||2,i + A|lT’x^nio'5'*ll2,i 

< ||A*IU + AriS*||2.i + A||Zx^nio^1l2.i 

= ||L*|U + A||S*||2,i. 

Note that 

^(^,* ^ g,. PxxnXoS*) = n{M' + Vix^x^S*) = n{M). 

Thus (L'*, S'* + Vx^c^igS*) is optimal to problem with input and by assumption we have 

Range(L'*) = Range(L*) = Range(Lo)) 


{3 ■ [S'*+Vx^nioS*h ^ Range(Lo)} = Supp(So). 


The second equation implies X Q {j : S'.* ^ Range(Lo)}- Suppose that X {j : S'* ^ Range(Lo)}- Then there exists an index 
k such that S'^ ^ Range(Lo) and k ^ X, i.e., M'^ = L*^. G Range(Lo)- Note that L'.* G Range(Lo)- Thus S'^ G Range(Lo) 
and we have a contradiction. Thus X = {j : S'* ^ Range(Lo)} = {j ■ S'.* ^ Range(L'*)} and the algorithm succeeds. ■ 


Lemma 10 (Matrix (Operator) Bernstein Inequality llT5l '). Let Xi G i = 1,.. 

valued random variables. Assume that L, L C IR are such that max {||X]i=i ® [^i ^*III 
L. Then 




> t 


< [m + n) exp 



, s, be independent, zero-mean, matrix- 
,\\j:U,nX*X,]\\} < M and \\X4 < 


for t < M/L, and 




> t 


3t 

< (m + n) exp ( — — 

OJ-J 


for t > M/L. 

Lemma|^shows that the success of algorithm is monotone on |Zo|. Thus by standard arguments in ifJTll . ifSTI . and 121, any 
guarantee proved for the Bernoulli distribution equivalently holds for the uniform distribution. 


Lemma 11. For any /C ~ Ber{p), with high probability, 

— p~^'Pf'R''Pf\\ < e and ||Zp — < e, 

provided that p > Coe“^(/irlogn(i))/n( 2 ) for some numerical constant Cq > 0, where TZ'{■) = 

Proof: The proof is in Appendix ■ 

Corollary 1 (iJTll. Assume that Xobs Ber{pf). Then with an overwhelming probability, "P-flP < e + 1 — Po. provided 

that po > Coe~^ (pr log n)/n for some numerical constant Cq > 0. 
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Lemma 12. Suppose that Z and 1C ^ Ber{p). Let 7^^(*) = with high probability 

max I {Z - p~^VfTZ'Z, Uab) I < e max | {Z, ujab) \, 

ab ah 


provided that p > Coe~^(/irlogn(i))/n( 2 ) for some numerical constant Co > 0. 

Proof: The proof is in Appendix ■ 

Lemma 13. Suppose that Z is a fixed matrix and JC ^ Ber[p). Let TV{■) = Then with high probability 


\\z-p-^n'z\\<c'^ 


«(i) logn(i) 


max|(Z,a;ij)|, 
i-j 


provided that p > Co(/ilogn(i))/n(i) for some small numerical constant Cq > 0. 

Proof: The proof is in Appendix ■ 

Lemma 14. Let TZ' be the projection operator onto space LI = Span{uJij, i,j S 1C} with any 1C, the space X = Span{uJij, j € 
J}, and T* =X f]Ll. Let J ^ Ber(a). Then with high probability 

IIa ^'P^TV'PiTV'P;^ — 'PifTV'P^'^ ~ II® — T^yT^^T^yll < e, 

provided that a > Coe~^{pr logn(^i'j)/n for some numerical constant Cq > 0. 

Proof: The proof is in Appendix ■ 

Corollary 2. Assume that T' = X nCl. Then for any X ~ Ber{a) and LI ^ Ber{p), with high probability 

\\{pa)-^V^V^V^-V^\\<{p-^ + l)e, 

provided that a,p> Cq£~‘^( prlogn^i^)/n for some numerical constant Cq > 0. 

Proof: By Lemma and Lemma [T^ we have 

\\'PS;VaV^ - pV^W <pe, 


and 

\\a-^V^rrV^-r^VaV^\\<s. 

So by triangle inequality, we have 

Wa-^V^T^rV^ - P'PvW 

< - pV^W + Wa-^r^VrV^ - V^VuV^W 

< {p+ l)e. 

That is 

\\[pa)-^V^Vrr^-V^\\<{p-^ + l)e. 


Corollary 3. Let If = Iq C Llobs, where Xq ~ Berlypi). Then with an overwhelming probability H’PnT’p|p < (1 — Pi)e + Pi, 
provided that 1 — pi > Co£~^(prlogn(i))/n/or some numerical constant Cq > 0. 


Proof: Let L = Xq H 
or equivalently 


obs ■ Note that Xq 


Ber(l-pi). By Lemma 14 we have ||(l-pi) ^'PVPT'P<£-'P<;}'Pn^i„'Po\\ < £, 


= (l-p,)-^v^VrV^ - {l-Pi)V^rn^,^V^\\ 

= (1 ~Pi) W'T’v'Tfiabs'T’v ~'T’\fT’(ij^nQ.oba)^v ~ PVPvV’o.abfT’vW 
= (1 - Pi)~^\\'Pj)'P{ior\n,i,)'Pj) - PiT’v'T’o.^t.'PyW 

= {l-p^)-^\\v^VuV^-p,V^Vn„,„V^\\ 

< £. 


Therefore, by the triangle inequality 

llWvf = W'Pv'PuV^W < WVy'PnVg - PiV^Vn^,^V^\\ + Pi\\V^Vn^,^V^\\ < {l-pi)e + p,. 
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B. Proofs of Theorem 

To prove Theorem the following matrix Chernoff Bound is invoked in our proof: 

Theorem 5 (Matrix Chernoff Bound Consider a finite sequence {X/.} G of independent, random, Hermitian 

matrices. Assume that 

0 ^ ^min{,X}f) ^ ^max{,X]i') 'f L. 

Define Y = ^/^Xk, and pr o-s the rth largest eigenvalue of the expectation Ey, i.e., p-r = Ar.(Ey). Then 

-i — 

¥{Xr{Y)> {l-e)pr}>l-r ^ 

_ 

> 1 — re 21 . ^ 

for e e [0,1). 

Lemma 15. Let X = UYV'^ be the skinny SVD of matrix X. For any set of coordinates Ct and any matrix X G we 

have rank(Xci.) = rank(Ucif and rank(X.Q) = rank(VQ:). 

Proof: On one hand, 

= la-.X = In-.UYV^ = Un,YV^. 

So rank(XQ:) < rank(?7n:). On the other hand, we have 

Xn-VY-^ = Un-.. 

Thus rank(C/n:) < rank(Xn:). So rank(Jrn:) = rank([/n:). 

The second part of the argument can be proved similarly. Indeed, = UYV'^I-si = UY[V"'"]:q and X.q = [y^]:n- 

So rank(X:n) = rank([y^]:o) = rank(Vh:), as desired. ■ 

Now we are ready to prove Theorem 

Proof: We investigate the smallest sampling parameter d such that the sampled columns from Lg = Vx^M exactly span 
Range (Lg) with an overwhelming probability. 

Denote by Lq = UYV’^ the skinny SVD of Lg. Let X = be the random sampling of columns from matrix 

V'^, where 6i ~ Ber(d/n). Define a positive semi-definite matrix 

n 

Y = xx* = 

Obviously, = Ar(^). To invoke the matrix Chernoff bound, we need to estimate L and fir in Theorem]^ Specifically, 

since 

n 

EY = Y,^W^UV^]■^ 

i^l 

= l[V^\[V^\\ 

we have pr = Ar(Ey) = d/n. Furthermore, we also have 

I n 

By matrix Chernoff bound, 

P{crr(-^) > 0} > 1 — re~™ 

= 1 - 

> 1 - ,5, 

we obtain 

d > 2pr\og . 

Note that ar(X) > 0 implies that rank([y^].j) = rank([Lg]./) = r, where the equality holds due to Lemma 
Range([Lo]:/) C Range(Lg). Thus Range([Lg]:;) = Range(Lg). ■ 


15 


Also, 
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C. Proofs of Claim 

To prove Theorem [T] the following proposition is crucial throughout our proof. 

Proposition 1. The solution to the optimization problem: 

nun||Z||*, s.t. L = LZ, (35) 

is unique and given by Z* = VlV^, where is the skinny SVD of L. 

Proof: We only prove the former part of the theorem. The proofs for the latter part of the theorem are similar. Suppose 
that (L*, S*) is a solution to problem (|^, while L*, S*) is not optimal to problem ( [33| l. So there exists an optimal 

solution to ( |3?| , termed which is strictly better than {L*{L*)f L*, S*). Namely, 

||^*IU + A||5*||2,i < \\L*iL*)^U + A||5*|!2.i, 

L::j, = 7^(Af) = 7^(Z/:jc + 

Fixing L and S as and in ( [33l ), respectively, and by Proposition we have 

||z,|u + a||5*||2.i = \\Vl,vI\u + x\\s42,i 

= rank(L*) + A|jS'*|| 2 ,i. 

Furthermore, by the property of Moore-Penrose pseudo-inverse, 

|lL*(L*)t||, + A||5*||2 .i = rank(L*) + A||5*||2,i. 


Thus 


rank(T*) + A||S'*|| 2 ,i < rank(L*) + A||S'*|| 2 ,i, 


which is contradictory to the optimality of [L* ,S*) to problem So {L*{L*)f L*, S*) is optimal to problem ( |33| ). ■ 


D. Proofs of Lemma 

Now we are prepared to prove Lemma 11 
Proof: For any matrix X, we have 

'PxX = y^(PxX,ujij)uJij, 
ij 

where T” is V or T. Thus TZ'VxX = 'Yhij i^ij{'PxX,uJij)uJij, where k^s are i.i.d. Bernoulli variables with parameter p. Then 

VxT^'VxX = '^Kij{VxX,u}ij)Vx{^ij) = Kij{X,'Px{tOij))'Px{^ij)- 
ij ij 

Namely, Vx'R-'Vx = Kij'Pxi^^ij) ® Vxiujij). Similarly, Vx = 'Pxi^ij) Z) 'PxipJij)- So we obtain 


Ip ^Vx'R-'Vx - Vx 


'Zip - ^)Vx{uJzj) 0 VxiuJzj) 




where Xij = [p ^Kij — \)Vxi'^ij) C) Vxi^ij) is a zero-mean random variable. 

To use Lemma [T^ we need to work out M and L therein. Note that 

ll^yll = \\iP~^|^^J - ^)Vxiu}ij) ®Vx{u}^J)\\ 

< \p~^Kij - l\\\Vxi(^ij) '^Vxi^^ij)\\ 

< max{p"^ - 1, l}\\VxiuJtj)\\% 

< 

“ n(2)P 
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Furthermore, 



= 

Y,e[x*^x,,] 

ij 


ij 


= 

5 >(^) 

tj ^ ' 


= {p~^ “ 1) X] \\'^x{^^^3)\\],Vx{uj,j) ®Vx{i^ij) 
u' 

— ’y^^'Pxi^ij) ®'Px{^ij) 
n(2)P ^ 

cfir 

n(2)P 


< 


W'PxW 


4 m. 


n(2)P 


Since M/c = 1 > e, by Lemma 10 we have 


n\\P~"'Pxn'Vx-Vx\\<e} 

8m) 

3e^nr 

= 2mnexp — - 


< 2mnexp 


= 2mn exp — 


^i2)P 


8c^r 
Ct^n(2)P 


< 2mnexp (—CCo logn(i)) 


= 2n 


-CCo+2 


where the second inequality holds once we have p > Coe logn(i))/n( 2 ). So the proof is completed. 


E. Proofs of Lemma 12 


We proceed to prove Lemma 12 


Proof: From the definition of operator R', we know that 

TZ (Z) = ^ ^ {Z^ujij)u}ij = ''^^5ij{Z,u)ij)uiij, 

ijeK ij 

where J^s are i.i.d. Bernoulli variables with parameter p. Notice that Z G T, so we have 

Z - p~^VfTZ' Z = ^(1 - p~'^5ij){Z,u}ifiVfUJij, 

■ij 

{Z - p~^'PfTi'Z.UJab) = - p~^Sij){Z,UJij){VfUJij,UJab)- 


and 


We now want to invoke the scalar Bernstein inequality. Let Xij = {1 — p ^Sij){Z,ujij) {VfU!ij,uJab) with zero mean. 

— l(^ P {Z ^ UJij^ (Rj^UJij ^ UJab}\ 

— 1(1 P ^ij ) I max I {Z ^ UJij ) I 11 lip 11 ^^ab 11 p 


2pr 

< -max|(Z,a;ab)| 

n(2)P 

4l. 
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Furthermore, 

= y^E(l 

ij ij 

= (p“^ - 

= (p“^ - l)max(Z,a;ij)^y^(wij,P^a;a6)^ 
= ip~^ - l)max(Z,a;y)^ ||7^^a;ab||p 

Zj " ' 

< max(Z,Wa6)^ 

n(2)P ab 

= M. 


Since M/L = maxat, \ {Z,ujab) \ > emaxab \ {Z,u}ab)\, by scalar Bernstein inequality, we obtain 

-3e^ maxat,(2', ujab)'^ 

8M 

-3e^n(2)P 

16/rr 


max \{Z — p ^VfTZ'Z, ujab) < £ max | {Z, ojab) I r < 2 exp 

ab ' ' \ 


ab 


= 2 exp 


-10 
- "-(i) > 


< n 


provided that p > Cge ^/rr logn(i)/n( 2 ) for some numerical constant Cq- 


F. Proofs of Lemma 1^ 


We are prepared to prove Lemma 13 


Proof: From the definition of operator TV, we know that 


7^ {Z') — y ^ {Z — y ^ ^ij (•^? ^ij )^ij ; 

ijeK ij 

where Sijs are i.i.d. Bernoulli variables with parameter p. So 

Z-p~^'R'Z = - p~^5ij){Z,ujij)uJij. 

ij 

Let Xij = (1 — p~^6ij){Z,uJij)uJij. To use the matrix Bernstein inequality, we need to bound X^j and its variance. To this 
end, note that 

ll"^jill = |1~F ^ij\ Il^iill 

< p“^||a;ij||_Fmax|(Z, ujij)\ 

^3 

= p“^max|(Z', Wjj)! 
tj 

4l. 


Furthermore, 


Similarly, 


Y.ex,,xl 

= 


ij 


ij 


<p ^ max{Z,uJij) 




= p ^ max(Z,a;ij)^ lln/^l 


n 

p -i-j 


= — max(Z, ojij) . 


E 


EXLX,, 


m 5 

< —max(z, Wi,-) 
p y 
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We now let M = n(i) maxij(Z, and set t as C'o-\/P~^’^(i) log^(i) max^ \ {Z,uJij)\. Since M/L = n(i) max^j \{Z,uJij)\ > 

t, by the matrix Bernstein inequality, we obtain 


\Z-p-^Tl'Z\\ < q 


/ n(i)logn(i) \ =F {\\Z - p-^n' Z\\ < t} 

y P *7 I 


= (to + n) exp 


-sq 

w 


-to 

- "(t) • 


< n 


G. Proofs of Lemma 14 


We proceed to prove Lemma 14 


Proof: For any fixed matrix Z, it can be seen that 

TZT’^Z= ^ ( fP:^Z,u}ij)wij = 'y ' Kij{Z,T’^u}ij)ujij. 
ije^^oba ij 

Note that the operators TV and Vi are commutative according to (0, thus we have 

T’y'R- T’x'R- T’^^Zi = 'y ( 5j y ^ {Z, T’^u}ij)T’^u}ij. 

3 i 

Similarly, = Y.- Yi and so 

[a-^V^n'Vin'V^ - V^'R'V^)Z = - 1) XI 


Namely, 


'^V^TZ'VxTZ'Vy - 'P^'R-'V^ = X(« ^^3 “ 1) X ® ■ 


We now plan to use concentration inequality. Let Xj = (a ^Sj — 1) Yi ® Notice that Xj is zero-mean 

and self-adjoint. Denote the set g = {HCiHf < 1jC 2 = ±Ci}. Then we have 


jXjll = sup(Ci,Xj(C' 2 )) = sup \a ^{Sj - a)| 


X«. 7 (C^i,^vK-))(C 2 ,^v(wy)) 


= \a ((5j-a)|sup|/(5q|. 


According to (|^, 


\\V^C^\\l^=maxY,{Ci,u:.jVV*)^ 

3 

i 

= max X {G* e* Ci, e* VV *) 

^ i 

<maxXl|e*Ci||^||e*LL*||2 


= max|!Ci|||||e*l/L*||^ 

3 

n 


where Gj is a unitary matrix. So we have 


|/(5,)l <Xl(^i>^vK))l \{C2,v^M)\ 

i 

= X(^i’^vK))' 

i 

i 

< WPvGlWloo 

pr 

— 5 

n 
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where the first identity holds since C 2 = iCi. Thus ||^j|| < /rra ^ = L. We now bound ■ Observe that 


\\Es,X^\\ < Es^\\X]\\ = = Es,a-^{d, - a) 2 sup/(^,)' 

9 

where the last identity holds because Ci, C 2 and 6j are separable. Furthermore, 

SUp/((5j)2 = sup Kij{Ci,V^{uj,j)){C2,Vy{uJ^j))^ 




Therefore, 


^ \\Es^X]\\ < Es^a-^6, - af^ 


^r(l — a) 


na 


WncAl 


< 


pr 


= M. 


Since M/L = 1 > e, by the matrix Bernstein inequality, 

p { \\a-^v^n'Vin'v^ - v^n'v^W < e} = 




< (to + n) exp 
= (to + n) exp 


< e 

-3e2 
8M 
—Se^na 
8fj,r 


-10 


< ^1) . 

provided that a > C'oe“^(/irlogn(i))/n for some numerical constant Cq > 0. 
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